在BIN之间传输多个对象是许多应用程序的常用任务。在机器人学中,标准方法是拿起一个对象并一次转移它。然而,抓住和拾取多个物体并立即将它们转移在一起更有效。本文介绍了一组新颖的策略,用于有效地抓住一个垃圾箱中的多个物体以将它们转移到另一个物体。该策略使机器人手能够识别最佳现成的手配置(预先掌握),并根据要掌握所需的物体计算屈曲协同作用。本文还提出了一种方法,它使用Markov决策过程(MDP)在所需的数量大于单个掌握的能力时模拟拾取传输例程。使用MDP模型,所提出的方法可以产生最佳的拾取传输程序,以最小化传输的数量,表示效率。所提出的方法已经在模拟环境和真正的机器人系统中进行了评估。结果表明,与最佳单一物体拣选 - 转移溶液相比,该方法将转移数59%和电梯数量减少58%。
translated by 谷歌翻译
人类手可以通过仅基于触觉感测的堆掌握一下目标数量的物体。为此,机器人需要在堆中掌握,从提升之前感测掌握中的物体的数量,并预测升降后将保持掌握的物体数量。这是一个具有挑战性的问题,因为在进行预测时,机器人手仍然在桩中,并且抓握中的物体对视觉系统不观察到。此外,在从堆中抬起之前手掌抓住的一些物体可能会在手中抬起时掉落。出现这种情况,因为它们被堆中的其他物体支持而不是手指。因此,机器人手应该在提升之前使用其触觉传感器来感测掌握的物体的数量。本文介绍了用于解决此问题的新型多目标抓取分析方法。它们包括掌握体积计算,触觉力分析和数据驱动的深度学习方法。该方法已经在Barrett手上实施,然后在模拟中评估和具有机器人系统的真实设置。评估结果得出结论,一旦BarretT手掌掌握了多个物体,数据驱动的模型可以在提升之前预测,在提升之后将保留在手中的物体的数量。用于我们方法的根均方误差为30.74,用于模拟的立方体和0.58个,球的距离,1.06个球体,对于真实系统的立方体,1.45。
translated by 谷歌翻译
由于医学成像社区缺乏质量注释,半监督学习方法在图像语义分割任务中受到高度重视。在本文中,提出了一种先进的一致性感知伪标签的自我同学方法,以充分利用视觉变压器(VIT)和卷积神经网络(CNN)的力量。我们提出的框架由一个功能学习模块组成,该模块由VIT和CNN相互增强,以及一个适合一致性意识的指导模块。伪标签是通过特征学习模块中的CNN和VIT的视图来重复和分别使用的,以扩展数据集,并且相互有益。同时,为特征学习模块设计了扰动方案,并利用平均网络权重来开发指导模块。通过这样做,该框架结合了CNN和VIT的特征学习强度,通过双视图共同训练增强性能,并以半监督的方式实现一致性的监督。对CNN和VIT的所有替代监督模式进行了拓扑探索,经过详细验证,证明了我们在半监督医学图像分割任务上的最有希望的性能和特定设置。实验结果表明,所提出的方法在带有各种指标的公共基准数据集上实现了最先进的性能。该代码公开可用。
translated by 谷歌翻译
在这项工作中,我们以一种充满挑战的自我监督方法研究无监督的领域适应性(UDA)。困难之一是如何在没有目标标签的情况下学习任务歧视。与以前的文献直接使跨域分布或利用反向梯度保持一致,我们建议域混淆对比度学习(DCCL),以通过域难题桥接源和目标域,并在适应后保留歧视性表示。从技术上讲,DCCL搜索了最大的挑战方向,而精美的工艺领域将增强型混淆为正对,然后对比鼓励该模型向其他领域提取陈述,从而学习更稳定和有效的域名。我们还研究对比度学习在执行其他数据增强时是否必然有助于UDA。广泛的实验表明,DCCL明显优于基准。
translated by 谷歌翻译
分子动力学(MD)仿真通过用数字积分器解决牛顿运动方程来预测原子的轨迹。由于物理限制,积分器的时间步长需要很小以维持足够的精度。这限制了模拟效率。为此,我们介绍了一个基于图形神经网络(GNN)的模型,MDNet,以预测坐标和动量的演变与大的时间阶跃。此外,由于其线性复杂性相对于系统尺寸,MDNET可以容易地扩展到更大的系统。我们展示了MDNET在具有大时间步骤的4000原子系统上的性能,并显示MDNET可以预测良好的平衡和运输特性,与标准MD模拟良好对齐。
translated by 谷歌翻译
用于图像分类的最可公开的数据集是单个标签,而图像在我们的日常生活中是固有的多标记。这种注释差距使得许多预先接受的单标准分类模型在实际情况下失败。该注释问题更加关注空中图像:从传感器收集的空中数据自然地覆盖具有多个标签的相对大的陆地面积,而被广泛可用的注释空中数据集(例如,UCM,AID)是单标记的。作为手动注释的多标签空中图像将是时间/劳动,我们提出了一种新的自我校正综合域适应(SCIDA)方法,用于自动多标签学习。 SCIDA是弱监督,即,自动学习多标签图像分类模型,从使用大量的公共可用的单一标签图像。为实现这一目标,我们提出了一种新颖的标签 - 明智的自我校正(LWC)模块,以更好地探索潜在的标签相关性。该模块还使无监督的域适配(UDA)从单个到多标签数据中可能。对于模型培训,所提出的型号仅使用单一标签信息,但不需要先验知识的多标记数据;它预测了多标签空中图像的标签。在我们的实验中,用单标签的MAI-AID-S和MAI-UCM-S数据集接受培训,所提出的模型直接在收集的多场景空中图像(MAI)数据集上进行测试。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译